AITopics | human-in-the-loop reinforcement learning

Collaborating Authors

human-in-the-loop reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Selective Progress-Aware Querying for Human-in-the-Loop Reinforcement Learning

Muraleedharan, Anujith, H, Anamika J

arXiv.org Artificial IntelligenceSep-26-2025

Human feedback can greatly accelerate robot learning, but in real-world settings, such feedback is costly and limited. Existing human-in-the-loop reinforcement learning (HiL-RL) methods often assume abundant feedback, limiting their practicality for physical robot deployment. In this work, we introduce SPARQ, a progress-aware query policy that requests feedback only when learning stagnates or worsens, thereby reducing unnecessary oracle calls. We evaluate SPARQ on a simulated UR5 cube-picking task in PyBullet, comparing against three baselines: no feedback, random querying, and always querying. Our experiments show that SPARQ achieves near-perfect task success, matching the performance of always querying while consuming about half the feedback budget. It also provides more stable and efficient learning than random querying, and significantly improves over training without feedback. These findings suggest that selective, progress-based query strategies can make HiL-RL more efficient and scalable for robots operating under realistic human effort constraints.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2509.20541

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Mapping Neural Signals to Agent Performance, A Step Towards Reinforcement Learning from Neural Feedback

Santaniello, Julia, Russell, Matthew, Jiang, Benson, Sassaroli, Donatello, Jacob, Robert, Sinapov, Jivko

arXiv.org Artificial IntelligenceJun-17-2025

Implicit Human-in-the-Loop Reinforcement Learning (HITL-RL) is a methodology that integrates passive human feedback into autonomous agent training while minimizing human workload. However, existing methods often rely on active instruction, requiring participants to teach an agent through unnatural expression or gesture. We introduce NEURO-LOOP, an implicit feedback framework that utilizes the intrinsic human reward system to drive human-agent interaction. This work demonstrates the feasibility of a critical first step in the NEURO-LOOP framework: mapping brain signals to agent performance. Using functional near-infrared spectroscopy (fNIRS), we design a dataset to enable future research using passive Brain-Computer Interfaces for Human-in-the-Loop Reinforcement Learning. Participants are instructed to observe or guide a reinforcement learning agent in its environment while signals from the prefrontal cortex are collected. We conclude that a relationship between fNIRS data and agent performance exists using classical machine learning techniques. Finally, we highlight the potential that neural interfaces may offer to future applications of human-agent interaction, assistive AI, and adaptive autonomous systems.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2506.12636

Country: North America > United States (0.30)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

PulseRide: A Robotic Wheelchair for Personalized Exertion Control with Human-in-the-Loop Reinforcement Learning

Zahid, Azizul, Poudel, Bibek, Scott, Danny, Scott, Jason, Crouter, Scott, Li, Weizi, Swaminathan, Sai

arXiv.org Artificial IntelligenceJun-6-2025

Maintaining an active lifestyle is vital for quality of life, yet challenging for wheelchair users. For instance, powered wheelchairs face increasing risks of obesity and deconditioning due to inactivity. Conversely, manual wheelchair users, who propel the wheelchair by pushing the wheelchair's handrims, often face upper extremity injuries from repetitive motions. These challenges underscore the need for a mobility system that promotes activity while minimizing injury risk. Maintaining optimal exertion during wheelchair use enhances health benefits and engagement, yet the variations in individual physiological responses complicate exertion optimization. To address this, we introduce PulseRide, a novel wheelchair system that provides personalized assistance based on each user's physiological responses, helping them maintain their physical exertion goals. Unlike conventional assistive systems focused on obstacle avoidance and navigation, PulseRide integrates real-time physiological data-such as heart rate and ECG-with wheelchair speed to deliver adaptive assistance. Using a human-in-the-loop reinforcement learning approach with Deep Q-Network algorithm (DQN), the system adjusts push assistance to keep users within a moderate activity range without under- or over-exertion. We conducted preliminary tests with 10 users on various terrains, including carpet and slate, to assess PulseRide's effectiveness. Our findings show that, for individual users, PulseRide maintains heart rates within the moderate activity zone as much as 71.7 percent longer than manual wheelchairs. Among all users, we observed an average reduction in muscle contractions of 41.86 percent, delaying fatigue onset and enhancing overall comfort and engagement. These results indicate that PulseRide offers a healthier, adaptive mobility solution, bridging the gap between passive and physically taxing mobility options.

machine learning, pulseride, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2506.05056

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.15)
North America > United States > New York > New York County (0.15)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.95)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Music Generation using Human-In-The-Loop Reinforcement Learning

Justus, Aju Ani

arXiv.org Artificial IntelligenceJan-25-2025

This paper presents an approach that combines Human-In-The-Loop Reinforcement Learning (HITL RL) with principles derived from music theory to facilitate real-time generation of musical compositions. HITL RL, previously employed in diverse applications such as modelling humanoid robot mechanics and enhancing language models, harnesses human feedback to refine the training process. In this study, we develop a HILT RL framework that can leverage the constraints and principles in music theory. In particular, we propose an episodic tabular Q-learning algorithm with an epsilon-greedy exploration policy. The system generates musical tracks (compositions), continuously enhancing its quality through iterative human-in-the-loop feedback. The reward function for this process is the subjective musical taste of the user.

composition, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2501.15304

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Accelerating the Convergence of Human-in-the-Loop Reinforcement Learning with Counterfactual Explanations

Karalus, Jakob, Lindner, Felix

arXiv.org Artificial IntelligenceAug-3-2021

The capability to interactively learn from human feedback would enable robots in new social settings. For example, novice users could train service robots in new tasks naturally and interactively. Human-in-the-loop Reinforcement Learning (HRL) addresses this issue by combining human feedback and reinforcement learning (RL) techniques. State-of-the-art interactive learning techniques suffer from slow convergence, thus leading to a frustrating experience for the human. This work approaches this problem by extending the existing TAMER Framework with the possibility to enhance human feedback with two different types of counterfactual explanations. We demonstrate our extensions' success in improving the convergence, especially in the crucial early phases of the training.

convergence, counterfactual, human-in-the-loop reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2108.01358

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.86)

Add feedback

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Keramati, Ramtin, Brunskill, Emma

arXiv.org Artificial IntelligenceApr-2-2020

Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent. We present an algorithm, value driven representation (VDR), that can iteratively and adaptively augment the observation space of a reinforcement learning agent so that is sufficient to capture a (near) optimal policy. To do so we introduce a new method to optimistically estimate the value of a policy using offline simulated Monte Carlo rollouts. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.

algorithm, observation space, optimal policy, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3320435.3320471

2004.01223

Country:

North America > United States > California > Santa Clara County > Stanford (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Middle East > Cyprus > Larnaka > Larnaca (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Where to Add Actions in Human-in-the-Loop Reinforcement Learning

Mandel, Travis (University of Washington) | Liu, Yun-En (Enlearn) | Brunskill, Emma (Carnegie Mellon University) | Popović, Zoran (University of Washington)

AAAI ConferencesFeb-14-2017

In order for reinforcement learning systems to learn quickly in vast action spaces such as the space of all possible pieces of text or the space of all images, leveraging human intuition and creativity is key. However, a human-designed action space is likely to be initially imperfect and limited; furthermore, humans may improve at creating useful actions with practice or new information. Therefore, we propose a framework in which a human adds actions to a reinforcement learning system over time to boost performance. In this setting, however, it is key that we use human effort as efficiently as possible, and one significant danger is that humans waste effort adding actions at places (states) that aren't very important. Therefore, we propose Expected Local Improvement (ELI), an automated method which selects states at which to query humans for a new action. We evaluate ELI on a variety of simulated domains adapted from the literature, including domains with over a million actions and domains where the simulated experts change over time. We find ELI demonstrates excellent empirical performance, even in settings where the synthetic "experts" are quite poor.

machine learning, reinforcement, reinforcement learning, (15 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.68)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback